Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggregations: add a min_doc_count option to terms and histogram #4662

Closed
jpountz opened this issue Jan 8, 2014 · 3 comments
Closed

Aggregations: add a min_doc_count option to terms and histogram #4662

jpountz opened this issue Jan 8, 2014 · 3 comments

Comments

@jpountz
Copy link
Contributor

jpountz commented Jan 8, 2014

Right now terms aggregations may return terms that match one hit or more. The purpose of the min_doc_count option is to make it configurable. For example, if

{
    "aggs" : {
        "tags" : {
            "terms" : {
                "field" : "tag"
            }
        }
    }
}

returns

{
    ...

    "aggregations" : {
        "tags" : {
            "buckets" : [
                {
                    "key" : "search",
                    "doc_count" : 115
                },
                {
                    "key" : "java",
                    "doc_count" : 50
                },
                {
                    "key" : "concurrency",
                    "doc_count" : 12
                }
            ]
        }
    }
}

then

{
    "aggs" : {
        "tags" : {
            "terms" : {
                "field" : "tag",
                "min_doc_count": 50
            }
        }
    }
}

would return

{
    ...

    "aggregations" : {
        "tags" : {
            "buckets" : [
                {
                    "key" : "search",
                    "doc_count" : 115
                },
                {
                    "key" : "java",
                    "doc_count" : 50
                }
            ]
        }
    }
}

The special case min_doc_count: 0 will behave similarly to the all_terms option of facets and also return terms that don't match any hit. For example, we could have the following response:

{
    ...

    "aggregations" : {
        "tags" : {
            "buckets" : [
                {
                    "key" : "search",
                    "doc_count" : 115
                },
                {
                    "key" : "java",
                    "doc_count" : 50
                },
                {
                    "key" : "concurrency",
                    "doc_count" : 12
                },
                {
                    "key" : "unit testing",
                    "doc_count" : 0
                },
                {
                    "key" : "performance",
                    "doc_count" : 0
                }
            ]
        }
    }
}

Histograms are going to support this option as well and the empty_bucket option will be removed in favor of min_doc_count: 0.

@bobrik
Copy link
Contributor

bobrik commented Jan 8, 2014

If min_doc_count is going to be 0 by default it would be great.

@ghost ghost assigned jpountz Jan 8, 2014
@uboness
Copy link
Contributor

uboness commented Jan 8, 2014

@bobrik currently the plan is to have 1 as the default as 0 comes with extra perf. costs and it's not necessarily the right default (really use case dependent)

@bobrik
Copy link
Contributor

bobrik commented Jan 8, 2014

Sorry, I actually meant 1.

jpountz added a commit to jpountz/elasticsearch that referenced this issue Jan 10, 2014
`min_doc_count` is the minimum number of hits that a term or histogram key
should match in order to appear in the response.

`min_doc_count=0` replaces `compute_empty_buckets` for histograms and will
behave exactly like facets' `all_terms=true` for terms aggregations.

Close elastic#4662
brusic pushed a commit to brusic/elasticsearch that referenced this issue Jan 19, 2014
`min_doc_count` is the minimum number of hits that a term or histogram key
should match in order to appear in the response.

`min_doc_count=0` replaces `compute_empty_buckets` for histograms and will
behave exactly like facets' `all_terms=true` for terms aggregations.

Close elastic#4662
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants